{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>.container { width:100% !important; }</style>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from IPython.core.display import display, HTML\n",
    "display(HTML(\"<style>.container { width:100% !important; }</style>\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Table of Contents\n",
    "* [Lecture 10 - Introduction to Time Series Data](#Lecture-8---Introduction-to-Time-Series-Data)\n",
    "\t* &nbsp;\n",
    "\t\t* [Content](#Content)\n",
    "\t\t* [Learning Outcomes](#Learning-Outcomes)\n",
    "* [Importing Time Series Data](#Importing-Time-Series-Data)\n",
    "* [Converting into Time Series Data](#Converting-into-Time-Series-Data)\n",
    "* [Filtering Time Series Data](#Filtering-Time-Series-Data)\n",
    "* [Resampling](#Resampling)\n",
    "\t* &nbsp;\n",
    "\t\t* [Moving (rolling/running) statistics](#Moving-%28rolling/running%29-statistics)\n",
    "\t* [Shift operations](#Shift-operations)\n",
    "\t\t* [Exercise:](#Exercise:)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lecture 10 - Introduction to Time Series Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Content"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1. Importing time series data\n",
    "2. Time series data types and conversions\n",
    "3. Time series filtering\n",
    "4. Time series resampling\n",
    "5. Plotting time series"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Learning Outcomes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At the end of this lecture, you should be able to:\n",
    "\n",
    "* import time series data\n",
    "* convert datasets into appropriate time series data types\n",
    "* filter dataframes based on time series conditions\n",
    "* perform resampling of time series data\n",
    "* perform running averages on time series data\n",
    "* visualise time series data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The overall goal of Pandas is that of becoming \"the most powerful and flexible open source data analysis manipulation tool available in any language\", and it is already well on its way toward realizing this. One of the domains where Pandas has been excelling and has become a proven a tool is in the domain of time series data analysis. \n",
    "\n",
    "Time series data is a sequence of data points that comprises of measurements made over a time interval, where the time interval is continuous, having the same distance between consecutive data points, while generating at most one data point for each given moment in time.\n",
    "\n",
    "Time series analysis is an substantive topic. The aim here will be to provide a brief introduction on how to process, manipulate and visualise time series data using a small subset of Pandas capabilities."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pylab import rcParams #this module gives us some controls over plot rendering attributes\n",
    "rcParams['figure.figsize'] = 15, 10"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "#this line enables the plots to be embedded into the notebook\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Importing Time Series Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below is a dataset extracted from Yahoo Finance showing the daily Apple stock price movements from 1980 to February 2016."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Open</th>\n",
       "      <th>High</th>\n",
       "      <th>Low</th>\n",
       "      <th>Close</th>\n",
       "      <th>Volume</th>\n",
       "      <th>Adj Close</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2016-02-03</td>\n",
       "      <td>95.000000</td>\n",
       "      <td>96.839996</td>\n",
       "      <td>94.080002</td>\n",
       "      <td>96.349998</td>\n",
       "      <td>45964300</td>\n",
       "      <td>95.830001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2016-02-02</td>\n",
       "      <td>95.419998</td>\n",
       "      <td>96.040001</td>\n",
       "      <td>94.279999</td>\n",
       "      <td>94.480003</td>\n",
       "      <td>37357200</td>\n",
       "      <td>93.970098</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2016-02-01</td>\n",
       "      <td>96.470001</td>\n",
       "      <td>96.709999</td>\n",
       "      <td>95.400002</td>\n",
       "      <td>96.430000</td>\n",
       "      <td>40943500</td>\n",
       "      <td>95.909571</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2016-01-29</td>\n",
       "      <td>94.790001</td>\n",
       "      <td>97.339996</td>\n",
       "      <td>94.349998</td>\n",
       "      <td>97.339996</td>\n",
       "      <td>64416500</td>\n",
       "      <td>96.814656</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2016-01-28</td>\n",
       "      <td>93.790001</td>\n",
       "      <td>94.519997</td>\n",
       "      <td>92.389999</td>\n",
       "      <td>94.089996</td>\n",
       "      <td>55678800</td>\n",
       "      <td>93.582196</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         Date       Open       High        Low      Close    Volume  Adj Close\n",
       "0  2016-02-03  95.000000  96.839996  94.080002  96.349998  45964300  95.830001\n",
       "1  2016-02-02  95.419998  96.040001  94.279999  94.480003  37357200  93.970098\n",
       "2  2016-02-01  96.470001  96.709999  95.400002  96.430000  40943500  95.909571\n",
       "3  2016-01-29  94.790001  97.339996  94.349998  97.339996  64416500  96.814656\n",
       "4  2016-01-28  93.790001  94.519997  92.389999  94.089996  55678800  93.582196"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ts_data = pd.read_csv('../datasets/appleStockPrice.csv')\n",
    "ts_data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Open</th>\n",
       "      <th>High</th>\n",
       "      <th>Low</th>\n",
       "      <th>Close</th>\n",
       "      <th>Volume</th>\n",
       "      <th>Adj Close</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>8857</th>\n",
       "      <td>1980-12-18</td>\n",
       "      <td>26.625000</td>\n",
       "      <td>26.750000</td>\n",
       "      <td>26.625</td>\n",
       "      <td>26.625</td>\n",
       "      <td>18362400</td>\n",
       "      <td>0.401907</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8858</th>\n",
       "      <td>1980-12-17</td>\n",
       "      <td>25.875000</td>\n",
       "      <td>25.999999</td>\n",
       "      <td>25.875</td>\n",
       "      <td>25.875</td>\n",
       "      <td>21610400</td>\n",
       "      <td>0.390586</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8859</th>\n",
       "      <td>1980-12-16</td>\n",
       "      <td>25.375000</td>\n",
       "      <td>25.375000</td>\n",
       "      <td>25.250</td>\n",
       "      <td>25.250</td>\n",
       "      <td>26432000</td>\n",
       "      <td>0.381151</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8860</th>\n",
       "      <td>1980-12-15</td>\n",
       "      <td>27.375001</td>\n",
       "      <td>27.375001</td>\n",
       "      <td>27.250</td>\n",
       "      <td>27.250</td>\n",
       "      <td>43971200</td>\n",
       "      <td>0.411342</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8861</th>\n",
       "      <td>1980-12-12</td>\n",
       "      <td>28.750000</td>\n",
       "      <td>28.875000</td>\n",
       "      <td>28.750</td>\n",
       "      <td>28.750</td>\n",
       "      <td>117258400</td>\n",
       "      <td>0.433984</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Date       Open       High     Low   Close     Volume  Adj Close\n",
       "8857  1980-12-18  26.625000  26.750000  26.625  26.625   18362400   0.401907\n",
       "8858  1980-12-17  25.875000  25.999999  25.875  25.875   21610400   0.390586\n",
       "8859  1980-12-16  25.375000  25.375000  25.250  25.250   26432000   0.381151\n",
       "8860  1980-12-15  27.375001  27.375001  27.250  27.250   43971200   0.411342\n",
       "8861  1980-12-12  28.750000  28.875000  28.750  28.750  117258400   0.433984"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ts_data.tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that the data that we imported has the order of observations in a descending order in respect to Date. We will deal with this later.\n",
    "\n",
    "Examine the data types for each of the columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts_data.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that the 'Date' column is an 'object' data type. This means that it has been interepreted as a 'string' rather than as a 'date' data type."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Converting into Time Series Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below is an example of how we can convert a column that is interpreted as a string, into a datetime datatype."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts_data['Date'] = pd.to_datetime(ts_data['Date'], format='%Y-%m-%d')\n",
    "ts_data.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice the 'format' specification and how it fits exactly the format of the original string."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(ts_data.info())\n",
    "ts_data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The Date column is now a datetime64 data type. Notice that the appearance of the Date column has not changed, which is why it is important to check that the data types are as you would like them to be for each column.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise:** Use the pd.read_clipboard() function to read the below data into a dataframe. Then convert the column Date into a datetime data type that has the format of Year/Month/Day as above:  "
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "Date\n",
    "03/02/2016\n",
    "02/02/2016\n",
    "01/02/2016\n",
    "29/01/2016\n",
    "28/01/2016\n",
    "27/01/2016\n",
    "26/01/2016\n",
    "25/01/2016\n",
    "22/01/2016\n",
    "21/01/2016"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "d = pd.read_clipboard()\n",
    "d"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can perform more powerful manipulation and processing if we make the Date column the index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "ts_data = ts_data.set_index(['Date'])\n",
    "ts_data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is important to know how to manually convert columns into datetime and make them into a dataframe index; however, when reading in a csv file, we can do all of the above automatically in future by specifying a couple of parameters. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts_data = pd.read_csv('../datasets/appleStockPrice.csv', index_col='Date', parse_dates=True)\n",
    "print(ts_data.info())\n",
    "ts_data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Filtering Time Series Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts_data['2016']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts_data['2016-2'] "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts_data.loc['2016-2']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise:** Filter the above dataframe to only display values from October 2015 to December 2015. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Given that the index is in the 'wrong' order, it makes it somewhat less intuitive to work with.\n",
    "\n",
    "We can reorder the index to make things easier."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts_data.sort_index(ascending=True, inplace=True)\n",
    "ts_data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise:** Filter the above dataframe to only display values after January 15 2015.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Filtering can also be done through a *truncate()*. Truncate is simply a convenience function that is equivalent to slicing. Below is an example of filtering data to just December 2015 and January 2016 observations:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "ts_data.truncate(before='2015-12-1', after='2016-1-31')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise:** Use the truncate function to filter the above dataframe to only display values after November 2015."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Resampling"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Resampling transforms time series data into a different frequency (e.g., converting hourly data into daily data). Pandas provide and easy way to perform these frequency conversion operations which are extremely common in  financial applications, but not limited to them only.\n",
    "\n",
    "Resampling requires that 1) the resampling time period is specified, 2) the method to apply to the resampled data (default is mean). For those familiar with SQL, resampling is essentially a time-based **groupby** operation, followed by a reduction method on each of its groups. \n",
    "\n",
    "Reduction can be: 'mean','median','sum','min','max','first','last','ohlc' or other available numpy/user defined transformation.\n",
    "\n",
    "A variety of built-in reduction time frequencies are available:"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "Alias  Frequency\n",
    "    B: business day frequency\n",
    "    D: calendar day frequency\n",
    "    W: weekly frequency\n",
    "    M: month end frequency\n",
    "    BM: business month end frequency\n",
    "    MS: month start frequency\n",
    "    BMS: business month start frequency\n",
    "    Q: quarter end frequency\n",
    "    BQ: business quarter endfrequency\n",
    "    QS: quarter start frequency\n",
    "    BQS: business quarter start frequency\n",
    "    A: year end frequency\n",
    "    BA: business year end frequency\n",
    "    AS: year start frequency\n",
    "    BAS: business year start frequency\n",
    "    H: hourly frequency\n",
    "    T: minutely frequency\n",
    "    S: secondly frequency\n",
    "    L: milliseonds\n",
    "    U: microseconds"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "ts_data['2015'].resample('M').mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise:** Resample the date above dataframe based on the quarter end frequency on data between 1990 and 2010 using the median as the reduction method. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise:** Describe what the output of the below means? "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "ts_data['Adj Close'].resample('A').ohlc()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Visualising the data is as simple as calling *plot()* on the required column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts_data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "rcParams['figure.figsize'] = 25, 10\n",
    "#ts_data[['Volume']].resample('M', how='sum').plot()\n",
    "plt.plot( ts_data[['Volume']].resample('M').sum())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can increase the size of the plot and render several plots at the same time:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rcParams['figure.figsize'] = 15, 10\n",
    "ts_data.plot(subplots=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise:** The period leading to the recent global financial crisis and the immediate aftermath are interesting to look at into more detail from the perspective of the adjusted closing price and the total volume of shares traded for Apple. Render separately two plots for these columns for data from 2007 to 2010."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use resampling to reduce the frequency of Apple share trading to annual and plot the historical variation between the min/max and the mean prices for Apple shares in each year:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.plot( ts_data[['Adj Close']].resample('A').mean())\n",
    "plt.plot( ts_data[['Adj Close']].resample('A').min())\n",
    "plt.plot( ts_data[['Adj Close']].resample('A').max())\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise:** Render a graph that is the same as above, only this time use a 5 year frequency:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Groupby operation on timeseries "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "ts_data.groupby(pd.Grouper(freq='A')).sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def up_or_down_closing_price(x):\n",
    "    if x.Close - x.Open > 0 : return 'up'\n",
    "    elif x.Close - x.Open < 0 : return 'down'\n",
    "    else : return 'no change'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "ts_data['closing_change'] = ts_data.apply(up_or_down_closing_price, axis=1)\n",
    "ts_data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "ts_data.groupby([pd.Grouper(freq='M'), 'closing_change' ]).count()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise:** Create a column that is the difference between the closing and opening prices for each day. Then perform a groupby operation for each month on the 'closing_change' columns and sum up all the upward and downward movements for each of the months."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Moving (rolling/running) statistics"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A rolling average is a series of averages of different subsets of the full data set as defined by a filter window.\n",
    "\n",
    "It is widely used indicator that helps smooth out price movements by filtering out the noise from random fluctuations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.DataFrame.rolling?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts_data[['Adj Close']].rolling?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts_data[['Adj Close']].rolling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "ts_data[['Adj Close']].rolling(5, min_periods=1).mean().head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts_data[['Adj Close']].rolling(5).mean().plot(style='-g')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise:** Generate rolling mean plots on the Volume column for the Apple share trading data. Determine the most 'useful' window size."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Shift operations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "“Shifting” refers to moving data backward and forward through time. Both Series and\n",
    "DataFrame have a  shift method for performing this operation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we wanted to calculate the difference in oil price from one year to the next (something very common in time series analysis), then pandas provides for us a method called shift(), which allows us to select a column and move the data in it up or down by a given amount. \n",
    "\n",
    "In our case, we want to see the difference between the values in price from one year to the next so we will shift the columns by one."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts_data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "\n",
    "ts_data['shifted'] = ts_data['Adj Close'].shift(1)\n",
    "ts_data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise**: Plot the positive and negative fluctuations of the adjusted apple stock price which has been resampled to a weekly frequency using the mean."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Creating new features "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts_data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create a slope coefficient on the closing price over a 30 day period, using 7 day moving average: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts_data['adj_close_7_moving_average'] = ts_data[['Adj Close']].rolling(7).mean()\n",
    "ts_data.head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts_data['adj_close_30_day_slope_on_moving_7_avg'] = (ts_data['adj_close_7_moving_average'] - ts_data['adj_close_7_moving_average'].shift(30)) / 30\n",
    "ts_data.tail()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ts_data.adj_close_30_day_slope_on_moving_7_avg.plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Exercise**: Devise a new feature for the stock movement data and implement it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "### Exercise: "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Go to https://inflationdata.com/articles/inflation-adjusted-prices/historical-crude-oil-prices-table/ and read in the Annual Average Domestic Crude Oil Prices table. \n",
    "\n",
    "Make sure you clean the data.\n",
    "\n",
    "Convert the 'Year' feature into datetime and set it as the index.\n",
    "\n",
    "Perform the same analysis as above using the 'shift' function and plot the Inflation Adjusted Price difference from one year to the next."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:teaching]",
   "language": "python",
   "name": "conda-env-teaching-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
